Global Optimization of Minority Game by Smart Agents 
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We propose a new model of minority game with so-called smart agents such that the standard 
deviation a 2 and the total loss in this model reach the theoretical minimum values in the limit of long 
time. The smart agents use trail and error method to make a choice but bring global optimization 
to the system, which suggests that the economic systems may have the ability to self-organize into 
a highly optimized state by agents who are forced to make decisions based on inductive thinking 
for their limited knowledge and capabilities. When other kinds of agents are also present, the 
experimental results and analyses show that the smart agent can gain profits from producers and 
are much more competent than the noise traders and conventional agents in original minority game. 
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I. INTRODUCTION 

The minority game (MG) models was introduced by 
Challet and Zhang in 1997 as a model for the competi- 
tion for limited resources[l), which have attracted much 
attention in recent years. The basic scenario is easy to 
explain: there is a population of N players who, at each 
time step, have to choose either or 1. Those who are 
in the minority win, the other lose (to avoid ambiguities, 
N is chosen to be odd). The agents make their decisions 
based on the most recent m outcomes, thus there are 
2 m different histories. A strategy is defined as a table 
of 2 m choices (either or 1) for the 2 m corresponding 
histories, so that there are 2 2 different strategies in the 
strategy-space. Each agent randomly picks s > 1 strate- 
gies from the strategy-space in the beginning of the MG. 
To each strategy is associated a integral point, which ini- 
tially takes the value and will increase by 1 at each time 
step if it predicts the result correctly. Each agent uses 
the one with the highest point among his s strategies, if 
there are several strategies with the same highest point, 
one of those will be chosen randomly. A very important 
quantity in this model is the overall loss defined as 



L(t) = N loss (t) - N win (t) > 1 



(1) 



where Moss and iV w i n are, respectively, the number of 
losers and the winners at time t. Apparently, the smaller 
Lit) is, the better the system performs. Another related 
quantity is called the standard deviation and defined as 



a 2 {t) = (n {t)-nf 



(2) 



where no is the number of agents who choose and 
n = N/2. It is easy to see that a 2 (t) = L 2 (t)/4 and 
theoretically, the minimum value of u 2 {t) is 0.25. 

One of the focuses of scientists' attention is the prob- 
lem how to improve the performance of system, i.e. 
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to reduce a 2 . Recently, some new kinds of agent are 
introduced|2,01, by whom the overall performance of sys- 
tem is improved. A farther question is whether it is pos- 
sible to achieve the global optimization in the framework 
of the MG model assuming that agents try to outsmart 
each other for their selfish gain and act based on induc- 
tive thinking 4] . 

Recently, a significant work is achieved by Reents, et al, 
who propose a stochastic minority game model in which 
a 2 is minimized 5] . In their model, an agent will not 
change his choice in the next time step if he wins in the 
present turn, by contraries, he will change his choice at 
probability p. The value of p is the same for all the 
agents. When p -C 1/N, Reents et al, found that a 2 ~ 
0.25. However, the agents in real-life systems are not as 
clever as Reents, they do not know how to select a value 
of p, and even do not know the total number of agents N. 
Thus Reents's model may be not proper for the systems 
consisting of agent with inductive thinking. 

Metzler and Horn have introduced the evolution into 
the stochastic minority game model|6j- Similarly to the 
evolutionary minority game modelQ, for an arbitrar 
agent i, a probability pt(t) and a score Si is equipped 
The score Si increases by 1 if the agent wins and de- 
creases by 1 if the agent loses. When s, < d < 0, the 
agent is deceased and replaced by a new agent with a 
reset score Si — 0. If Pi(t) of the new agent is randomly 
distributed in (0, 1), the average value of Pi(t) in the fi- 
nal stationary state is found to be at the order of 1 and 
thus a 2 ~ 0{N 2 ). They also discussed the situation in 
which the new agent chooses Pi(t) by copying the value of 
Pj (t) of another agent who is randomly selected. Within 
this scheme, it is possible to see that Pi(t) ~ 0(1/N) and 
a 2 ~ 1 in sufficiently long time. However, it is still unrea- 
sonable to assume that an agent knows the information 
of all other agents. Furthermore, p is at the order of 1/iV 
and thus a 2 is greater than 0.25 in the final state. The 
best solution is still not achieved in their model. 

In the present paper, we propose a new model of mi- 
nority game with so-called smart agents such that the 



standard deviation a 1 and the total loss in this model 
reach the theoretical minimum values in the limit of long 
time. The smart agents act based on inductive thinking 
but bring global optimization to the system. Experimen- 
tal results and analyses show that when other kinds of 
agents are also present, the smart agent can gain profits 
from producers and are much more competent than the 
noise traders and conventional agents in original minority 
game. 



II. MODEL AND NUMERICAL SIMULATION 

Our model consists of N agents with iV an odd in- 
teger. Each agent has only one strategy which evolves 
with the following rule: suppose at a given time step 
t, the memory (history) is /i and the strategy of the 
i-th agent is Si(t,v) for v = 0, ...,2 M — 1. Also, each 
agent has a probability function pi(t, v) for i — 1, ..., TV 
and v — 0, ..., 2 M — 1. If the z-th agent wins at t, the 
strategy will not be changed; contrarily, with probabil- 
ity 1 — Pi(t, (J,), Si(t,v) is not changed, with probability 
Pi(t,/x), Si(t+l,fx) = l-Si(t,iJ,), but Si(t+l,i/) = Si{t,v) 
for all other v ^ /j,. 

The initial value of Pi(t, v) is randomly selected in (0, 1) 
and evolves by self-teaching mechanism, which is the sim- 
plest trail and error method. For a given time step t with 
history /i, consider the last time step t' when the memory 
is also /i. If the agent i won at t' or he loses but does not 
change Sj(t',/z), then no changes will occur. Otherwise, 
Pi(t,n) will change according to the following rulejij: 



Pi(t+ l,fx) 



min(l, 2pi(t, fj,)) agent i wins at time t 
Pi(t, fi)/2 agent i loses at time t 

(3) 
No changes will occur for all Pi(t, v) with v ^ /i. 

Note that the evolution of Pi(t,n) for different mem- 
ories is essentially decoupled in our model. Therefore, 
mathematically speaking, the m ^ case is a trivial 
generalization of the to — case. The reason why we in- 
troduce different memories here is to mimic the behavior 
of the agents in real-life markets that the agents study 
the selection rules for different memories in order to find 
some regularities. 

Figure 1 shows the simulation results, which indicate 
that the system will reach global optimization in suffi- 
ciently long time. We have checked that the property of 
time evolution of a 2 (t) for the cases with more agents 
and larger memory is the same as that of N = 101 and 
to = 0,1,2. 

Fig. 2 presents the log-log plot for the time dependence 

of G(t) = EiliPiW and H(t) = niliftC*) for N = 101 
and to = 0, respectively. The results show that G(t) has a 
power law dependence of time with the exponent 7 w — 1 
when t is large, which suggests that G(t) — > 0(t — » 00), 
thus it is reasonable to suppose pi(t) <^ 1/N when t is 
sufficiently large. In this case, at most one agent may 
change the strategy at each time step (the probability 
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FIG. 1: Time evolution of a (t) for N — 101 smart agents 
with m =0 (a), 1 (b), and 2 (c). The value of cr 2 (£) shown in 
this figure is the average of 10 independent experiments and 
the horizontal line represents a 2 = 0.25. 



for two or more agents changing their strategies at the 
same time is negligibly small) thus the number of agents 
on the majority side is always (N + l)/2. Therefore, the 
agent who changes the strategy is from the losing side to 
the losing side and pi(t) is reduced by a factor of 2. Since 
Pi(t) <g; 1/N, the probability that one agent will change 
his strategy is 



1- n a 

i£W t (t) 



■Pitt)) 



ieWi(t) 



G{t) 



where Wi(t) is the set of losers at time t. Then we have 
the iterative equations for G{t) and H(t): 

G(t + 1) = V^^Git) + (1 - v )G(t) (4) 



H(t + 1) 



V- 



H(t) 



(l-v)H(t) 



(5) 



According to Eq.(4)&(5), one can find that G(t) ~ i _1 
and H{t) ~ i -Ar [l(|, which is consentaneous with the 
simulation results shown in figure 2. 
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FIG. 2: Time dependence of G(t) (a) and H(t) (b), where 
N — 101 and m = 0. The slopes of the two curves in figure a 
& b arc -1.01(« -1) and -102(« -N), respectively. 



III. SMART AGENTS IN MIXED MARKET 

Challet et al classified the agents into three different 
types yj|: producers who have only one strategy, spec- 
ulators (conventional agents in original minority game) 
who have two or more strategies, and the noise traders 
who make their choices by random tosses. In this section, 
we will investigate how smart agents perform in mixed 
market [IJ • 

Firstly, let us look into how the smart agents compete 
with the producers. Assume that there are N p producers 
and N s smart agents with N p + N s an odd integer, each 
producer has only one fixed strategy. For simplicity, we 
shall first discuss the case of to = 0. Suppose N p0 pro- 
ducers always choose 0, and N p % producers always choose 
1. If A = N p o-N p i > N s (< -N s ), then iV s smart agents 
must choose 1 (0) in the equilibrium state and win at each 
time step. When N s > A > (the case N s > -A > is 
analogic), the situation is slightly complicated. From the 
discussion in section 2, it is not difficult to see that the 
overall loss of N p + N s agents is minimized in the equilib- 
rium state. Namely, there will be either (N s — A + l)/2 
smart agents choosing and (N s + A — 1)/2 smart agents 
choosing 1 or (N s — A — 1)/2 smart agents choosing and 
(N s + A + l)/2 smart agents choosing 1. In the former 
case, the agents choosing are losers, whiles in the latter 
case, the agents choosing are winners. The equilibrium 
state is described by the transition between two cases. 



Before it switches to another case, the equilibrium state 
stays in one case for a period of time, called the life time. 
The life times of two cases are different. Assume that 
the probability pi of agent i is independent of i, then the 
life time of the former case is t\ = 2/(N s — A + l)(p) 
and the latter case is t^ = 2/{N s + A + l)(p), where (p) 
denotes the average value of p*. The overall gain of the 
smart agents at each time step is equal to 



E = 



1 -K* 



A - 1 N s - A + 1 



Tl + T 2 " ' 2 2 

JV.-A-l N s + A+1. 

+ ( S o ) T 2\ 



)n 



i 



N s 



1 



[A 2 



1-JV, 



(6) 



Therefore, £ > when A < N s < A 2 - 1. The average 
profit gained by each smart agent at each time step is 



1 



N s N S (N S + 1) 



[A 2 - 1 - N 3 ] 



(7) 



According to Eq.(7), when N s < A 2 — 1, each smart 
agent can gain profits from producers. Suppose the num- 
ber of smart agent N s is not fixed, if iV s < A 2 — 1, some 
new smart agents, if available, will join the game since 
they can gain profits from producers. Thus there will 
be eventually N s w A 2 — 1 smart agents in the market, 
whose profits are approximatively equal to with slight 
fluctuation. This process can be considered as an exam- 
ple for the efficient market hypothesis (EMH), which is 
hotly controversial in the recent years [T3 . But in real- life 
financial market, the number of producers is alterable, 
thus the equilibrium state can rarely be reached. 

When to > 0, the number of possible histories is 
2 m > 1. For a given history fi, suppose N p0 ([i) producers 
always choose and N p i(fi) producers always choose 1. 
Then A(/u) = 7V p o(m) — ^pi(m) i s a function of /U. Since 
different history /i is essentially decoupled in our model 
and the number of smart agents N s is fixed, there may 
be three cases under history [i: (i) |A(/z)| > N s , each 
smart agent can gain one point at each time step; (ii) 
A 2 (/x) — 1 > N s > |A(/x)|, the smart agents can averagely 
gain profit from the producers; (iii) A 2 (/i) — 1 < N s , the 
smart agent cannot gain profit and are characterized by 
the overall loss described by Eq.(l). 

The above picture is confirmed by the numerical sim- 
ulation result shown in Figure 3(a). One can find that 
a decreases as t increases and decays to 0.25 when t is 
sufficiently large. Figure 3(b) plots the time dependence 
of the mean gain for smart agents: 



A s (t) 



N swi n(t) - N slosc {t) 

N, 



where N sw i n and N s \ osc denote the number of smart 
agents who win and lose, respectively. Initially, A s (i) 
is negative, but as t increases, A s (t) becomes positive. 
Therefore, the smart agents can gain profits from pro- 
ducers in the regime A 2 (/x) — 1 > N s . 



/ ^>-#m4v^*#i/v^^ 




FIG. 3: Time evolution of a 2 (t) (a) and A„(t) (b), where 
N p = 200, N s = 801, m = 1 and A(0) = A(l) = 200. The 
value of cr 2 (t) and A s (t) shown in these two figures is the 
average of 32 independent experiments and the horizontal line 
in figure (a) represents a 2 = 0.25. 



FIG. 5: Time evolution of a 2 (t) (a), A s (t) (b), and A m (t) (c), 
where N 3 =51, N m — 50, m = 3 and the number of strategies 
used by the conventional agents is 2. The value of a 2 (t), A s (t) 
and Amit) shown in these three figures is the average of 32 
independent experiments and the horizontal line in figure (a) 
represents a — 0.25. 




FIG. 4: Time evolution of a 2 (t) (a), A s (t) (b), and A n (t) (c), 
where N s = 801, N n = 200 and m = l.The value of o 2 (t), 
A s (t) and A n (t) shown in these three figures is the average of 
32 independent experiments and the horizontal line in figure 
(a) represents a 2 = 0.25. 



Secondly, let's consider the case in which the noise 
traders and smart agents are present. Assume that there 
are N n noise traders and N s smart agents with N n + N s 
an odd integer. Figure 4(a) plots the time dependence of 
a 2 , one can find that a 2 decreases as t increases, but does 
not reach the theoretical Optimization 0.25 in the limit 
of long time. This result is not difficult to understand for 
the existence of noise traders will bring more fluctuations 
into the system. Figure 4(b) and 4(c) exhibit the time 
dependence of A s and A n respectively, where A n is the 
mean gain of noise traders: 



A n (t) = 



N nw i n (t) — N n losc(t) 



N nW i n and N n \ ose denote the number of the noise traders 
who win and lose, respectively. Apparently, the smart 
agents perform much better than the noise traders do. 

At last, We have studied the case in which the conven- 
tional agents, who take the actions based on the original 
minority game model 1], and smart agents are present. 
Assume that there are N s smart agents and N m con- 
ventional agents with iV s + N m an odd integer. Figure 
5(a) shows the time dependence of a 2 . One sees that 
a 2 decreases with time but also does not reach the theo- 
retical Optimization 0.25 in the limit of long time. This 
result implies that the conventional agents also introduce 
fluctuations, though its magnitude is less than the noise 



traders in this case, into the system. In figure 5(b) and 
5(c), we report the time dependence of A s and A m re- 
spectively, where A m is the mean gain of conventional 
agents: 



A m {t) 



1 *mwin [l) * *nlosc \v) 



tional agents who win and lose, respectively. From these 
two figures, one immediately finds that the smart agents 
perform much better than the conventional agents. This 
is an evidence that it may be not reasonable to use the 
conventional agents to mimic the actual traders in real- 
life markets. 



IV. DISCUSSION AND CONCLUSION 

We propose a new model of minority game with so- 
called smart agents, who use trail and error method to 
make a choice. When only the smart agents are present, 
it is found that the overall loss is minimized to the theo- 
retical limit as a 1 — ► 0.25(i — > 00). Notice that although 
those smart agents are independent and only trying to do 
their best for their selfish gain based on inductive think- 
ing, the Global Optimization is achieved in our model. 
The result suggests that the economic systems may have 
the ability to self-organize into a highly optimized state 
by agents who are forced to make decisions based on in- 
ductive thinking for their limited knowledge and capabil- 
ities. 

In mixed market cases, when the model consists of the 
smart agents and the producers with only one fixed strat- 



egy, we have found that, under certain circumstances, the 
smart agents can gain profit from the producers. Also, 
the overall loss of the producers and the smart agents is 
minimized. When the model consists of the smart agents 
and the noise traders who choose the room randomly at 
each round, it is found that the smart agents also cooper- 
ate very well so that the overall loss of the smart agents 
becomes very small when the time is sufficiently large. 

It is worthwhile to emphasize that, the smarts agents 
perform much better than the conventional agents in 
mixed market. Imagine an agent trying to figure out 
the regularity of the financial market. Assume at time 
ti, he has the selection rules for all possible histories, 
i.e., he has a strategy. At a later time £2, he finds that 
the selection rules for some histories do not give profits. 
Therefore, he may change the selection rule for these his- 
tory, but not for the other histories which still give him 
profits. This is in contrast with the original MG model in 
which an agents selects the strategy with the highest vir- 
tual point. When he changes the strategy, he may change 
many selection rules although they still make profits. We 
think that is the reason why the conventional agents are 
less competent than smart agents. 
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